Project Status: Active – The project has reached a stable, usable state and is being actively developed.


Introduction

The aim of this practical is to give you an understanding of the common appraches that are used to process and analyse animal movement data. We will be doing this in the R statistical framework and will introduce you to some commonly used R packages.

During this practical you will:

  1. Load in an example dataset of animal locations and explore the data with some preliminary plots
  2. Produce a map of the study area in an appropriate spatial projection for the region
  3. Load in an example dive summary dataset and explore the data with some preliminary plots
  4. Error correct and regularise the animal location data and combine them with dive summary data
  5. Extract example environmental covariates from the regions the animal utilises and append them to the dataframe
  6. Perform a preliminary statistical analysis to better understand the drivers of animal movement

This practical will be based on the tidyverse style of R coding. Tidyverse is a collection of R packages designed for data science. The key for this practical is that by learning to structure our data in a ‘tidy’ way we can use the same tools in similar ways for different datasets. This helps to make our data processing and analysis reproducable.

If reproducable and open science is something you are interested in have a look at this paper from Lowndes et al. 2017 and this blog post on Tidy Data for reproducibility, efficiency and collaboration.

   

Idealised R for data science workflow envisioned by Dr Julia Lowndes. Artwork by Allison Horst

   


Toolbox

During this practical you will become familiar with using pipes %>% to perform data manipulations, using ggplot2 to generate publication quality figures, and using the sf package to handle spatial data.

For more information on the tidyverse check out the Wickham & Grolemund book ‘R for Data Science’. You can access it for free online here:
https://r4ds.had.co.nz

The project website can be accessed here:
https://www.tidyverse.org

For more information on the sf project check out:
https://r-spatial.github.io/sf/

This practical will require that you have first downloaded and installed the following R packages: tidyverse, rnaturalearth, rnaturalearthdata, sf, viridis, lubridate, foieGras, marmap, raster, rgdal

If you run into issues with the foieGras package later on in the practical you may need to install TMB from source (e.g. install.packages("TMB", type = "source")). This will likely depend on which version of R you are using.

I would recommend opening a new session in R Stuidio, starting a new project, opening and saving an R script in this session. This is a sensible way to curate and keep track of your files, and will be particularly useful when it comes to managing the data and analysis for your research projects. When you close R you will be able to reopen the session via the project and all your files will be loaded ready to continue the analysis. For specific instructions look at Chapter 8 in the R for Data Science. Bonus points if you also start to use the here package.

   

Use R projects not setwd()!. Artwork by Allison Horst

   


Background

In March 2019 I was part of an expedition to deploy SMRU SRDL devices on harp seal pups in the Gulf of St Lawrence, Canada. Harp seals give birth on the sea ice when it is at it’s most southerly extent in late Winter/ early Spring. The females wean the pups after around 2 weeks, and the pups start to enter the water and feed for themselves approximately 2 weeks after weaning. We deployed tags on 12 pups that were approximately 3 weeks old, and tracked their movements as they left the sea ice and began their first migration north.

The tags have now stopped transmitting, but you can explore a map of the raw unfiltered location data here:
https://jamesgrecian.shinyapps.io/harpmap/

   

Harp seal pup with SMRU SRDL. Gulf of St Lawrence March 2019

   


Data

SMRU Satellite Relay Data Loggers (SRDLs) collect a wide range of information and transmit the data through the Argos satellite network. In this practical we will be considering the location data and the dive summary data.

Data for 6 of these animals are available to you via GitHub, you can load this straight into R using the read_csv command:

locs <- read_csv("https://raw.githubusercontent.com/jamesgrecian/BL5122-dev/master/data/hp6_diag.csv") dives <- read_csv("https://raw.githubusercontent.com/jamesgrecian/BL5122-dev/master/data/hp6_summary.csv")

The locs data frame is the “diag” data from the tags. This has columns containing information on the animal id, the date and time of the fix, the estimated quality of the argos location, the estimated longitude and latitude, as well as information on the error ellipses from the Argos Kalman Filter (a more detailed measure of the location quality). Refer to your lecture notes for more information on this.

The dives data frame is the “summary” data from the tags and contains 6 hourly summary information for all the dives performed by an individual. Some of the information has been truncated for this practical, but there are columns containing the start time of the 6 hour period, the proportion of time an individual spent at the surface, diving, or hauled out, the number of dives finished during the period, the average dive depth, maximum dive depth, average dive duration and maximum dive duration.

For more information on the data collected by the tags see here:
http://www.smru.st-andrews.ac.uk/protected/specs/DatabaseFieldDescriptions.pdf

You’ll notice these data are in a “tidy” format, this might have been described to you as ‘narrow data’ or ‘long data’. Each variable forms a column, each observation forms a row and each cell is a single measurement.

   

What is tidy data? Artwork by Allison Horst

   

This may seem obvious, but by ensuring the data are in a standard format the steps we take with our analysis should be repeatable on other datasets with the same format. For example, the steps we take with this practical should be easily applied to other SMRU SRDL data, or any tag data that comes in a similar format.

   

Tidy data workbench. Artwork by Allison Horst

   


Practical Tasks

We are now going to work through an example analysis of the animal movement data in 6 steps. Each step is a seperate task. Work through the task description and have a go at trying to write some R code to complete the task. Most of the R functions you might need are listed in the task description text. If you get stuck there are demonstrators on hand to help.

Once you have run your R code have a look at the ‘code’ tab. This contains an example script demonstrating how I might answer the task - I’m sure it’s not the only way. Try to avoid jumping straight to the code and copying and pasting it into your console. The answer tab contains example R output and figures for you to compare against.


Task 1: Load in example location data and explore

Task

Use the read_csv function from the tidyverse series of packages to load the location dataset into R.

You can look at the top 10 rows of the dataframe by typing the object name: locs, you can look at a summary of the dataframe by typing summary(locs).

Generate some preliminary plots to explore the data, try plotting using the ggplot command. Have a look at the help file by typing ?ggplot

If you haven’t used ggplot before, it is based on “The Grammar of Graphics”. Have a look at the tidyverse help page, it has some great cheat sheets: https://ggplot2.tidyverse.org

   

Build a masterpiece with ggplot2. Artwork by Allison Horst

   

Every plot (even a map) has the same building blocks; a data set, coordinate system, and aesthetics. For example here is a simple plot of the longitude and latitude data from the tag dataset:

ggplot() + geom_point(aes(x = lon, y = lat), data = locs)

Use this code to plot the migratory paths of the animals using the lon and lat column from the SRDL diag data. Try using the facet_wrap command to generate seperate plots for each animal (the id column).

Try plotting other parts of the locs dataframe, what patterns do you see emerge?

Can you generate a plot to illustrate when the animals begin their northward migration? Try plotting latitude against time.

Can you identify gaps in the data, or poor quality locations due to issues with satellite coverage? Perhaps colour the locations using the lc column.

How long do the tags last? Can you generate a summary table? Have a look at ?dplyr::summarise for some help


Code

require(tidyverse)

# load in the data
locs <- read_csv("https://raw.githubusercontent.com/jamesgrecian/BL5122-dev/master/data/hp6_diag.csv")

locs # check the data

summary(locs) # generate a summary

# lon lat plot coloured by animal id
p1 <- ggplot() +
  geom_point(aes(x = lon, y = lat, colour = id), data = locs) +
  scale_color_viridis_d() +
  facet_wrap(~id, nrow = 3)
print(p1)

# change in lat through time
p2 <- ggplot() +
  geom_point(aes(x = date, y = lat, colour = id), data = locs) +
  scale_color_viridis_d() +
  facet_wrap(~id, nrow = 3)
print(p2)

# lon lat plot coloured by argos location class
p3 <- ggplot() +
  geom_point(aes(x = lon, y = lat, colour = lc), data = locs) +
  scale_color_viridis_d() +
  facet_wrap(~id, nrow = 3)
print(p3)

# how long do the tags last?
locs %>% 
  group_by(id) %>% 
  summarise(max(date) - min(date)) 

Answer

## # A tibble: 21,502 x 8
##    id         date                lc      lon   lat  smaj  smin   eor
##    <chr>      <dttm>              <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 hp6-747-19 2019-03-23 00:08:33 3     -59.6  48.1   269    93    97
##  2 hp6-747-19 2019-03-23 00:25:33 3     -59.6  48.1   338    75    69
##  3 hp6-747-19 2019-03-23 00:37:38 1     -59.6  48.1  4984   144    83
##  4 hp6-747-19 2019-03-23 10:20:14 3     -59.7  48.2   358    89   116
##  5 hp6-747-19 2019-03-23 10:22:18 3     -59.7  48.2   239    82    96
##  6 hp6-747-19 2019-03-23 15:07:47 2     -59.7  48.2  1345    76   132
##  7 hp6-747-19 2019-03-23 15:36:07 3     -59.7  48.2   665    48    90
##  8 hp6-747-19 2019-03-23 20:03:42 B     -59.7  48.2  4369   528    42
##  9 hp6-747-19 2019-03-23 20:13:27 2     -59.7  48.3  1890    72    84
## 10 hp6-747-19 2019-03-24 01:25:02 3     -59.7  48.3   271    70    82
## # … with 21,492 more rows
##       id                 date                          lc           
##  Length:21502       Min.   :2019-03-23 00:02:21   Length:21502      
##  Class :character   1st Qu.:2019-05-22 08:30:23   Class :character  
##  Mode  :character   Median :2019-07-18 00:56:36   Mode  :character  
##                     Mean   :2019-07-12 07:13:21                     
##                     3rd Qu.:2019-08-30 17:45:37                     
##                     Max.   :2019-11-21 20:32:37                     
##       lon              lat             smaj             smin       
##  Min.   :-70.74   Min.   :44.70   Min.   :     0   Min.   :     9  
##  1st Qu.:-62.30   1st Qu.:49.43   1st Qu.:  2476   1st Qu.:   108  
##  Median :-59.69   Median :52.53   Median :  5274   Median :   410  
##  Mean   :-57.49   Mean   :56.00   Mean   : 15894   Mean   :   842  
##  3rd Qu.:-55.68   3rd Qu.:60.98   3rd Qu.: 11815   3rd Qu.:   920  
##  Max.   :-13.04   Max.   :76.27   Max.   :986216   Max.   :288529  
##       eor        
##  Min.   :  0.00  
##  1st Qu.: 78.00  
##  Median : 89.00  
##  Mean   : 88.87  
##  3rd Qu.: 99.00  
##  Max.   :180.00

## # A tibble: 6 x 2
##   id         `max(date) - min(date)`
##   <chr>      <drtn>                 
## 1 hp6-747-19 243.8500 days          
## 2 hp6-749-19 197.8026 days          
## 3 hp6-751-19 199.9178 days          
## 4 hp6-753-19 111.0021 days          
## 5 hp6-755-19 178.7776 days          
## 6 hp6-756-19 187.0016 days

Task 2: Produce a map of the study area

Task

Using the sf package alongside ggplot2 and the tidyverse it is possible to generate publication quality maps in R.

You will need to download an appropriate shapefile of the world to add to the map. Have a look at the rnaturalearth package. You can generate a plot of the shapefile using the geom_sf function.

   

Simple spatial data analysis with sf. Artwork by Allison Horst

   

Once you have generated a simple map, try adding the animal location data. To do this you will need to add projection information to the location dataframe.

You can do this using the st_as_sf and st_set_crs commands from sf. Latitude and Longitude information is typically referred to as WGS84 or CRS code 4326

For more information have a look at the epsg.io website: https://epsg.io

Harp seals are an Arctic species, so the WGS84 projection means it is hard to see what is going on. We can change the projection of the map to remove some of this distortion.

One option might be to try a polar stereographic projection, although the Gulf of St Lawrence isn’t really “polar”: "proj=stere"

Another option might be to try a Lambert azimuthal equal-area projection, This minimises distortion around the area of interest, and you can customise the projection to be centred on the study area: "+proj=laea"

N.B. These projection code examples aren’t complete…

Think about what projection may be appropriate and try adding it to the plot using the coord_sf command.

If you are looking for some more tips, Robin Lovelace has a fantastic book on Geocomputation with R that you can access for free here: https://geocompr.robinlovelace.net


Code

require(sf)
#install.packages("rnaturalearth")
#install.packages("rnaturalearthdata")
require(rnaturalearth)

# Generate a global shapefile and a simple plot
world <- ne_countries(scale = "medium", returnclass = "sf")

p4 <- ggplot() +
  theme_bw() +
  geom_sf(aes(), data = world)
print(p4)

# Create an sf version of the locs data with a WGS84 projection and add to the plot
locs_sf <- locs %>% st_as_sf(coords = c('lon', 'lat')) %>% st_set_crs(4326)

# Add the locations to the WGS map
print(p4 + geom_sf(aes(colour = id), data = locs_sf, show.legend = "point"))

# To generate a plot with less distortion first define a projection i.e. Lambert Azimuthal Equal Area
prj = "+proj=laea +lat_0=60 +lon_0=-50 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"

p5 <- ggplot() +
  theme_bw() +
  geom_sf(aes(), data = world) +
  coord_sf(crs = prj) +
  geom_sf(aes(colour = id), data = locs_sf, show.legend = "point") +
  scale_color_viridis_d() +
  coord_sf(xlim = c(-2500000, 2500000), ylim = c(-2500000, 2500000), crs = prj, expand = T) +
  scale_x_continuous(breaks = seq(from = -130, to = 30, by = 10))
print(p5)

Answer


Task 3: Load in an example dive summary dataset and explore

Task

Load in the dive summary data and generate some preliminary plots to visualise the data.
Hint - you’ve already written similar code for the previous dataset.

This data is transmitted via the Argos system, do we have all the 6 hour summary data from all the tags or are there gaps in transmission?
Hint - look at the group-by function to group the data for each individual, then count the rows using tally

These animals were tagged as newly weaned pups, can you see changes in their diving behaviour over time?
Hint - try plotting different dive parameters as a time series using ggplot


Code

dives <- read_csv("https://raw.githubusercontent.com/jamesgrecian/BL5122-dev/master/data/hp6_summary.csv")

#how many dive records do we have?
dives %>% group_by(id) %>% tally()

p6 <- ggplot() +
  geom_point(aes(x = date, y = haul_time/60),
             data = dives %>% filter(date < "2019-06-23")) +
  geom_smooth(aes(x = date, y = haul_time/60),
              formula = "y ~ x", method = "loess", span = 0.5,
              data = dives %>% filter(date < "2019-06-23")) +
  facet_wrap(~id, nrow = 3)
print(p6)

Answer

## # A tibble: 6 x 2
##   id             n
##   <chr>      <int>
## 1 hp6-747-19   233
## 2 hp6-749-19   595
## 3 hp6-751-19   586
## 4 hp6-753-19   290
## 5 hp6-755-19   676
## 6 hp6-756-19   562


Take a break

   

Hey there - you’re half way through! Take a break…

   


Task 4: Regularise location data and combine with dive summary data

Task

If we want to explore how the dive behaviour of these animals changes as they migrate we need to link the surface movements to the dive summary data.

However, if you look at the time stamps of both the datasets they do not match:
head(locs$date)
head(dives$date)

This is because, while the dive data are summarised regularly, the location data are only collected when a satellite passes overhead and the animal is at the surface.

It is common practice to remove erroneous fixes and regularise the location data before conducting statistical analysis. We can do this using a type of state space model known as a random walk filter.

This can be fitted very quickly using the new foieGras package. Have a look at the help file by typing browseVignettes(package = "foieGras")

You will need to generate a sequence of times that you want the model to estimate locations for. Add these to the time.step argument in the fit_ssm command.
Hint - the idea here is to find the first and last datetime for each animal and generate a regular sequence of time steps at 6 hour intervals. There will be a range of different approaches to try this. However, this is one of the harder elements of the practical - feel free to look directly at the code.

Once you have run the model (which will take a few minutes), extract the predicted locations using the grab command.

This should give you a table of locations that match the time stamps in the dive data. Combine the columns from these two datasets using the left_join command. The output should be a dataframe containing animal id, datetime, lon, lat and the dive summary data.


Code

require(lubridate)
require(foieGras)

# Create a regular series of datetimes that can be matched to the dive data
# These will be at 6 hour intervals starting at midnight
times <- locs %>% group_by(id) %>% summarise(min = floor_date(min(locs$date), "6 hours"),
                                             max = ceiling_date(max(locs$date), "6 hours")) %>%
  mutate(date = map2(min, max, seq, by = "6 hours")) %>% # Create a list column with dates
  unnest(cols = c(date)) %>%
  select(id, date) %>%
  ungroup

# Regularise the location data to match the 6 hour dive summary data 
# Fit a continuous time random walk using the Argos least squares data
fit <- fit_ssm(locs, model = "rw", time.step = times)

# Extract fitted values from model
plocs <- grab(fit, "predicted", as_sf = F)

# combine regularised locations with dive data
dat <- left_join(plocs, dives, by = c("id" = "id", "date" = "date"))
print(dat)

Answer

## fitting rw...
## 
 pars:   2.13419 2.43339 0.53013      
 pars:   1.95112 3.41635 0.54657      
 pars:   2.04326 2.92163 0.53829      
 pars:   1.57165 2.77577 0.5932      
 pars:   1.92769 2.81753 0.55207      
 pars:   1.8561 2.95573 0.56467      
 pars:   1.90841 2.85903 0.55541      
 pars:   1.87313 2.83152 0.56553      
 pars:   1.8978 2.85075 0.55845      
 pars:   1.89096 2.86122 0.56429      
 pars:   1.8978 2.85075 0.55845      
## 
 pars:   1.44489 1.36307 -0.08551      
 pars:   1.81693 2.29078 -0.05466      
 pars:   2.8089 2.17894 0.0043      
 pars:   2.09886 2.3958 -0.03667      
 pars:   2.12022 2.21747 0.20537      
 pars:   2.06357 2.35097 -0.01948      
 pars:   2.05954 2.38138 0.03161      
 pars:   2.07714 2.35257 0.08071      
 pars:   2.04559 2.36433 0.12988      
 pars:   2.0613 2.36735 0.09266      
 pars:   2.07575 2.37122 0.11237      
 pars:   2.0613 2.36735 0.09266      
## 
 pars:   1.11893 1.1341 -0.43822      
 pars:   1.47641 2.06672 -0.38893      
 pars:   2.44483 1.92567 -0.59454      
 pars:   1.71724 2.1535 -0.42457      
 pars:   1.67196 2.39673 -0.49932      
 pars:   1.70111 2.24013 -0.45119      
 pars:   1.6627 2.20483 -0.52703      
 pars:   1.70757 2.24972 -0.5937      
 pars:   1.70494 2.21421 -0.67858      
 pars:   1.72664 2.25267 -0.75935      
 pars:   1.73343 2.22658 -0.84736      
 pars:   1.73343 2.22658 -0.84736      
## 
 pars:   0.9069 0.78231 -0.05739      
 pars:   1.31049 1.6971 -0.07378      
 pars:   2.19527 1.23302 -0.11611      
 pars:   1.38648 1.51152 -0.07779      
 pars:   1.55454 1.59946 -0.14299      
 pars:   1.40359 1.52896 -0.08314      
 pars:   1.40407 1.51591 -0.10447      
 pars:   1.40098 1.528 -0.12615      
 pars:   1.40098 1.528 -0.12615      
## 
 pars:   0.81432 0.86686 0.06734      
 pars:   1.13276 1.81456 0.04558      
 pars:   2.13036 1.85043 -0.01374      
 pars:   1.37912 2.50875 -0.06127      
 pars:   1.77743 2.15971 -0.03607      
 pars:   1.47144 1.80576 -0.07851      
 pars:   1.68204 2.04937 -0.0493      
 pars:   1.56042 2.10942 -0.10456      
 pars:   1.63456 2.07281 -0.07088      
 pars:   1.63575 2.04915 -0.12291      
 pars:   1.64212 2.0857 -0.23109      
 pars:   1.64304 2.04137 -0.3365      
 pars:   1.64018 2.04958 -0.26567      
 pars:   1.64433 2.06631 -0.31266      
 pars:   1.64433 2.06631 -0.31266      
## 
 pars:   1.5975 1.57378 -0.28696      
 pars:   1.7928 2.55423 -0.2631      
 pars:   2.65601 3.02297 -0.4506      
 pars:   2.31561 3.96309 -0.46814      
 pars:   1.53315 3.45724 -0.83128      
 pars:   2.24616 3.62137 -0.47578      
 pars:   1.92002 3.50982 -0.52916      
 pars:   2.21555 3.52104 -0.47827      
 pars:   2.11151 3.50798 -0.47452      
 pars:   2.1939 3.49625 -0.47882      
 pars:   2.17285 3.5211 -0.47401      
 pars:   2.18817 3.50302 -0.47751      
 pars:   2.18305 3.49762 -0.47251      
 pars:   2.18305 3.49762 -0.47251
## # A tibble: 5,862 x 18
##    id     date                  lon   lat      x     y   x.se  y.se surface_time
##    <chr>  <dttm>              <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>        <dbl>
##  1 hp6-7… 2019-03-23 00:00:00 -59.6  48.1 -6634. 6088.  0.969  2.47          0  
##  2 hp6-7… 2019-03-23 06:00:00 -59.6  48.1 -6639. 6100. 22.5   58.4          18.9
##  3 hp6-7… 2019-03-23 12:00:00 -59.7  48.2 -6643. 6107.  9.64  25.0           0  
##  4 hp6-7… 2019-03-23 18:00:00 -59.7  48.2 -6643. 6116. 10.5   27.1           0  
##  5 hp6-7… 2019-03-24 00:00:00 -59.7  48.3 -6642. 6123.  8.85  23.0           0  
##  6 hp6-7… 2019-03-24 06:00:00 -59.6  48.3 -6637. 6122. 23.9   61.4           0  
##  7 hp6-7… 2019-03-24 12:00:00 -59.6  48.3 -6631. 6119.  8.16   6.97          0  
##  8 hp6-7… 2019-03-24 18:00:00 -59.6  48.3 -6633. 6118. 18.9   48.7          13.4
##  9 hp6-7… 2019-03-25 00:00:00 -59.5  48.2 -6629. 6115. 11.3   29.3           0  
## 10 hp6-7… 2019-03-25 06:00:00 -59.5  48.2 -6627. 6112. 21.6   56.0          NA  
## # … with 5,852 more rows, and 9 more variables: dive_time <dbl>,
## #   haul_time <dbl>, n_dives <dbl>, av_depth <dbl>, max_depth <dbl>,
## #   sd_depth <lgl>, av_duration <dbl>, sd_duration <dbl>, max_duration <dbl>

Task 5: Extract environmental covariates and append to dataframe

Task

Now that we have a regularised dataset with location and dive information, we can extract some environmental data to allow us to explore the drivers of movement.

The marmap package contains a function to download global bathymetry data from the ETOPO1 global relief model at up to 1 minute (~1.8 km) resolution.

Have a look at the getNOAA.bathy function. You will need to supply the coordinates of the bounding box you want data for, and the resolution you need. The bounding box information can be taken from the location dataframe that you generated as part of Task 2, have a look at the st_bbox function.

Once you have downloaded the data, to generate a nice plot of the bathymetry data using ggplot, you will need to convert it to a raster object, project it to your chosen projection, and then convert it to a dataframe. Have a look at the projectRaster, rasterToPoints and geom_raster functions.

Finally, we need to append geolocated bathymetry data to the animal movement dataframe. We can do this using the extract function in the raster package.

Extracting environmental data from raster layers at a particular location is a very common task for ecologists. You may already know how to do this using GIS software. Developing the ability to do it in R means it’s straightforward to scale up the task to much larger datasets, or even to automate it.


Code

#Download bathymetry data from ETOPO1 database hosted on the NOAA website
#install.packages("marmap")
require(marmap)
bathy <- getNOAA.bathy(lon1 = st_bbox(locs_sf)[1],
                       lon2 = st_bbox(locs_sf)[3],
                       lat1 = st_bbox(locs_sf)[2],
                       lat2 = st_bbox(locs_sf)[4], resolution = 5)
bat = marmap::as.raster(bathy)

# Define a projection and transform bathymetry data
r = raster::projectRaster(bat, crs = sp::CRS(prj), method = 'ngb', res = 10000)
bat_df = as.data.frame(raster::rasterToPoints(r))
names(bat_df) = c("x", "y", "depth")

p7 <- ggplot() +
  theme_bw() +
  geom_raster(aes(x = x, y = y, fill = depth), data = bat_df) +
  geom_sf(aes(), data = world) +
  geom_sf(aes(colour = id), data = locs_sf, show.legend = "point") +
  scale_color_viridis_d() +
  coord_sf(xlim = c(-2500000, 2500000), ylim = c(-2500000, 2500000), crs = prj, expand = T) +
  scale_x_continuous(breaks = seq(from = -130, to = 30, by = 10)) + 
  xlab("") + ylab("")
print(p7)

#Extract the bathymetry underlying each point
dat <- dat %>%
  mutate(bathy = raster::extract(bat, as.matrix(dat[c("lon", "lat")])))
dat

Answer

## # A tibble: 5,862 x 19
##    id     date                  lon   lat      x     y   x.se  y.se surface_time
##    <chr>  <dttm>              <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>        <dbl>
##  1 hp6-7… 2019-03-23 00:00:00 -59.6  48.1 -6634. 6088.  0.969  2.47          0  
##  2 hp6-7… 2019-03-23 06:00:00 -59.6  48.1 -6639. 6100. 22.5   58.4          18.9
##  3 hp6-7… 2019-03-23 12:00:00 -59.7  48.2 -6643. 6107.  9.64  25.0           0  
##  4 hp6-7… 2019-03-23 18:00:00 -59.7  48.2 -6643. 6116. 10.5   27.1           0  
##  5 hp6-7… 2019-03-24 00:00:00 -59.7  48.3 -6642. 6123.  8.85  23.0           0  
##  6 hp6-7… 2019-03-24 06:00:00 -59.6  48.3 -6637. 6122. 23.9   61.4           0  
##  7 hp6-7… 2019-03-24 12:00:00 -59.6  48.3 -6631. 6119.  8.16   6.97          0  
##  8 hp6-7… 2019-03-24 18:00:00 -59.6  48.3 -6633. 6118. 18.9   48.7          13.4
##  9 hp6-7… 2019-03-25 00:00:00 -59.5  48.2 -6629. 6115. 11.3   29.3           0  
## 10 hp6-7… 2019-03-25 06:00:00 -59.5  48.2 -6627. 6112. 21.6   56.0          NA  
## # … with 5,852 more rows, and 10 more variables: dive_time <dbl>,
## #   haul_time <dbl>, n_dives <dbl>, av_depth <dbl>, max_depth <dbl>,
## #   sd_depth <lgl>, av_duration <dbl>, sd_duration <dbl>, max_duration <dbl>,
## #   bathy <dbl>

Task 6: Examine the drivers of movement

Task

The data frame you have now generated and familiarised yourself with is the type that movement ecologists will often create when analysing their data. The same steps can be followed to extract data from pretty much any environmental layer that you might be interested in. In this next task I’m going to get you to examine how the behaviour of the animal changes over the migration, and to see whether this may be linked to the localised bathymetry.

In the lecture I talked about “move persistence” as a measure of animal behaviour, and how it can be useful to identify periods of residencey and periods of migratory behavior.

Using the foieGras package have a look at the fit_mpm function. This can be used to estimate the move persistence of each individual along their migratory path.

Once you have run a model, try replotting your map of the animal movements but with the locations coloured by the behaviour g.

Finally, try plotting move persistence against bathymetry to examine how behaviour may change in response to the environment.


Code

# Fit the move persistence model to the data
fmpm <- fit_mpm(dat, model = "jmpm")

# Extract fitted states from model
fmpm_g <- grab(fmpm, "fitted", as_sf = F)

# Combine the behaviour with the dataframe
dat <- left_join(dat, fmpm_g, by = c("id" = "id", "date" = "date"))
print(dat)

# Generate a map coloured by the states
p8 <- ggplot() + 
  theme_bw() +
  geom_sf(aes(), data = world) +
  geom_sf(aes(colour = g),
             data = dat %>% 
               st_as_sf(coords = c("lon", "lat")) %>%
               st_set_crs('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs') %>%
               st_transform(prj)) +
  scale_colour_viridis_c("g", limits = c(0,1)) +
  coord_sf(xlim = c(-2500000, 2500000), ylim = c(-2500000, 2500000), crs = prj, expand = T) +
  scale_x_continuous(breaks = seq(from = -130, to = 30, by = 10)) + 
  xlab("") + ylab("") +
  facet_wrap(~id)
print(p8)

p9 <- ggplot() +
  geom_point(aes(x = bathy, y = g, colour = g), data = dat) +
  scale_colour_viridis_c("g", limits = c(0,1)) +
  xlab("Bathymetry (m)") + ylab("Move persistence (g)") +
  facet_wrap(~id)
print(p9)

Answer

## fitting jmpm...
## 
 pars:   0 0 0      
 pars:   -0.70205 -0.71207 -0.00902      
 pars:   -2.80822 -2.84827 -0.03606      
 pars:   0.50683 -5.07711 0.17061      
 pars:   -2.40875 -2.83615 -0.0194      
 pars:   -2.29927 -3.06748 -0.32681      
 pars:   -2.32487 -2.86763 -0.0613      
 pars:   -2.2388 -2.8233 -0.08151      
 pars:   -2.27885 -2.84564 -0.16914      
 pars:   -2.24413 -2.77943 -0.35228      
 pars:   -2.26601 -2.82114 -0.23691      
 pars:   -2.25869 -2.86011 -0.29844      
 pars:   -2.31404 -2.83119 -0.33662      
 pars:   -2.27272 -2.83585 -0.3044      
 pars:   -2.26976 -2.84523 -0.33131      
 pars:   -2.25225 -2.82424 -0.38168      
 pars:   -2.2887 -2.82508 -0.42591      
 pars:   -2.25815 -2.85251 -0.46589      
 pars:   -2.25561 -2.81611 -0.51008      
 pars:   -2.27612 -2.83341 -0.56072      
 pars:   -2.25306 -2.83455 -0.61317      
 pars:   -2.25306 -2.83455 -0.61317
## # A tibble: 5,862 x 21
##    id     date                  lon   lat      x     y   x.se  y.se surface_time
##    <chr>  <dttm>              <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>        <dbl>
##  1 hp6-7… 2019-03-23 00:00:00 -59.6  48.1 -6634. 6088.  0.969  2.47          0  
##  2 hp6-7… 2019-03-23 06:00:00 -59.6  48.1 -6639. 6100. 22.5   58.4          18.9
##  3 hp6-7… 2019-03-23 12:00:00 -59.7  48.2 -6643. 6107.  9.64  25.0           0  
##  4 hp6-7… 2019-03-23 18:00:00 -59.7  48.2 -6643. 6116. 10.5   27.1           0  
##  5 hp6-7… 2019-03-24 00:00:00 -59.7  48.3 -6642. 6123.  8.85  23.0           0  
##  6 hp6-7… 2019-03-24 06:00:00 -59.6  48.3 -6637. 6122. 23.9   61.4           0  
##  7 hp6-7… 2019-03-24 12:00:00 -59.6  48.3 -6631. 6119.  8.16   6.97          0  
##  8 hp6-7… 2019-03-24 18:00:00 -59.6  48.3 -6633. 6118. 18.9   48.7          13.4
##  9 hp6-7… 2019-03-25 00:00:00 -59.5  48.2 -6629. 6115. 11.3   29.3           0  
## 10 hp6-7… 2019-03-25 06:00:00 -59.5  48.2 -6627. 6112. 21.6   56.0          NA  
## # … with 5,852 more rows, and 12 more variables: dive_time <dbl>,
## #   haul_time <dbl>, n_dives <dbl>, av_depth <dbl>, max_depth <dbl>,
## #   sd_depth <lgl>, av_duration <dbl>, sd_duration <dbl>, max_duration <dbl>,
## #   bathy <dbl>, g <dbl>, g.se <dbl>


Conclusions

During this practical you have been introduced to a number of tools that are used to process and analyse animal movement data. These tools have been based in the tidyverse style of R coding and can be used in a wide range of data science applications.

You have:

  1. Loaded in an example dataset of animal locations and explored the data with some preliminary plots
  2. Produced a map of the study area in an appropriate spatial projection for the region
  3. Loaded in an example dive summary dataset and explored the data with some preliminary plots
  4. Error corrected and regularised the animal location data and combined them with dive summary data
  5. Extracted example environmental covariates from the regions the animals utilised and appended them to the dataframe
  6. Performed a preliminary statistical analysis to better understand the drivers of animal movement

Our analysis revealed that the seals moved northward, with some animals travelling as far as Baffin Bay and others crossing the Labrador Sea to Greenland. Animals performed both directed and undirected movements during this time. Our preliminary analysis did not reveal a strong link with bathymetry; animals performed both directed and undirected movement in shallower water, but only performed directed movements when travelling over deep water.

The manuscript that inspired this practical R tutorial has recently been accepted for publication in Royal Society Open Science. This journal is open access, uses open peer review (the reviews, decision letter and author reponses are also published) and encourages open data through the archiving of R scipts in Zenodo and data in the Dryad digital repository.

The manuscript should be available in the next month, here’s the reference to keep a look out for:

Grecian, W. J., Stenson, G. B., Biuw, M., Boehme, L., Folkow, L. P., Goulet, P. J., Jonsen, I. D., Malde, A., Nordøy, E. S., Rosing-Asvid, A. & Smout, S. (In Press) Environmental drivers of population-level variation in the migratory and diving ontogeny of an Arctic top predator Royal Society Open Science https://doi.org/10.1098/rsos.211042